AN RNN-based compensation method for Mandarin telephone speech recognition

نویسندگان

  • Sen-Chia Chang
  • Shih-Chieh Chien
  • Chih-Chung Kuo
چکیده

In this paper, a novel architecture, which integrates the recurrent neural network (RNN) based compensation process and the hidden Markov model (HMM) based speech recognition process into a unified framework, is proposed. The RNN is employed to estimate the additive bias, which represents the telephone channel effect, in the cepstral domain. Compensation of telephone channel effects is implemented by subtracting the additive bias from the cepstral coefficients of the input utterance. The integrated recognition system is trained based upon MCE/GPD (minimum classification error/generalized probabilistic descent) method with an objective function that is designed to minimize recognition error rates. Experimental results for speaker-independent Mandarin polysyllabic word recognition show an error rate reduction of 21.5% compared to the baseline system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A robust RNN-based pre-classification for noisy Mandarin speech recognition

This paper addressed the problem of speech signal preclassification for robust noisy speech recognition. A novel RNN-based pre-classification scheme for noisy Mandarin speech recognition is proposed. The RNN, which is trained to be insensitive to noise-level variation, is employed to classify each input frame into the three broad classes of initial, final and pure-noise. An on-line noise tracki...

متن کامل

Mandarin telephone speech recognition for automatic telephone number directory service

This paper discusses an HMM-based Mandarin telephone speech recognition method for implementing a prototype system of automatic telephone number directory service. It adopted the GPD/MCE training algorithm to train the HMM models for 100 final-dependent syllable initials and 40 syllable finals. The SBR method was used to compensate the speaker and channel effects. Besides, an RNN-based pre-clas...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

An RNN-based preclassification method for fast continuous Mandarin speech recognition

A novel recurrent neural network-based (RNN-based) frontend preclassification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classify the input frame into four states including three stable states o...

متن کامل

Robust SBR method for adverse Mandarin speech recognition - Electronics Letters

10 RRSBR An RNN-based robust signal bias removal (RRSBR) method is proposed for improving both the recognition performance and the computational efficiency of the SBR method for adverse Mandarin speech recognition. It differs from the SBR method in using three broadclass sub-codebooks to encode the feature vector of each frame and combining the three encoding residuals to form the frame-level s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998